YouTube API Design Evaluation and Latency Budget

Introduction#

In the previous lesson, we learned how a streaming service's functional requirements are met. In this lesson, we’ll focus on some interesting aspects of the non-functional requirements of the design. We’ll try to answer some of the common questions that might have come to your mind regarding API performance.

Non-functional requirements#

The subsequent sections discuss how the non-functional requirements are met.

Scalability#

YouTube-like systems need to scale in both aspects, that is, horizontally and vertically. We provide loosely coupled services by executing independent tasks statelessly and in parallel. Since it is not possible to serve numerous users requesting large-sized videos simultaneously, YouTube uses Internet exchange points to populate CDNs and the Google global cache (GGC) to serve end users effectively to scale its services.

Hierarchy for serving static viral content to end users
Hierarchy for serving static viral content to end users

Availability #

In cases of load spikes, such as epidemic events, unexpected viral videos, or DDoS attacks, we fan out client requests by adding a queuing system, allowing servers to respond when they have free capacity, rather than processing them directly. This may add some response delay, but the system will remain available under these circumstances. We also offload some processing to the client machine, such as managing playtime and other events, by sending only the most necessary events to the server.

Additionally, routing popular content through Content Delivery Networks (CDNs) allows us to reduce latency, avoid single points of failure, and increase fault tolerance. Below, we provide the CDN workflow for a client requesting a video trailer. We show how YouTube is able to redirect users to the CDN that serves images under the domain of www.ytimg.com (YouTube images):

Created with Fabric.js 3.6.6
A client requests static content from the application server, say, a video trailer

1 of 11

Created with Fabric.js 3.6.6
The application server returns the URL for the video trailer

2 of 11

Created with Fabric.js 3.6.6
The client requests the DNS server to get the IP address of the image server

3 of 11

Created with Fabric.js 3.6.6
The DNS server returns the IP of the nearest CDN server

4 of 11

Created with Fabric.js 3.6.6
The client requests the nearest CDN for the video trailer

5 of 11

Created with Fabric.js 3.6.6
It is a CDN-MISS and the CDN requests the video trailer from the application server

6 of 11

Created with Fabric.js 3.6.6
The application server responds with the preview content

7 of 11

Created with Fabric.js 3.6.6
The CDN server responds to the client request and stores the data in its cache

8 of 11

Created with Fabric.js 3.6.6
Another client requests the same resource

9 of 11

Created with Fabric.js 3.6.6
The client gets the IP resolved and requests the image server for data

10 of 11

Created with Fabric.js 3.6.6
The CDN responds with the content from its cache

11 of 11

Flexibility/adaptability #

Our API supports a wide variety of consumers such as TVs, mobile phones, desktops, etc., and we make sure our services are flexible enough to work across these different devices by transcoding audio/video chunks into the appropriate formats. We also support adaptive bitrate and buffering based on device capabilities to improve user playback experience. Additionally, user playback history is synced to keep logged-in users consistent across devices.

Security #

We assume that the majority of requests sent to YouTube are from unauthenticated users, and we identify these types of requests by using the API key and the session created when the user plays their first video. YouTube may also have private and public content. To support access to private content, we have a login mechanism for user authorization and authentication. Also, for third-party clients (sites, applications, etc., that embed YouTube players), we allow authorization by using OAuth and OpenID Connect code authentication with the PKCE flow.

Note: Our discussion on the optimization of the file upload API service has some commonalities with the YouTube API service because they deal with uploading (large) files. To avoid repetition, we recommend going through that content as well.

Low latency#

For a streaming service like YouTube, we should expect more read requests than write requests, which means more users watch videos than upload them to their channel. Note that we use a write-expensive approach where the data is processed while the video is being uploaded. Hence, users can directly retrieve the preprocessed data, and reading data is much faster than writing data. We further improve performance by reducing latency using the following techniques:

  • Caching: The ETags are used to identify a specific version of a response object. When we transfer data in segments (streams), we can consider each segment as a different version of the same object, which can be cached using ETag values (also see the caching at different layers lesson).

  • Adaptive bitrate: This allows clients to intelligently request segments of the specific quality (144–4k) they need, as per the available bandwidth and network congestion.

  • Prefetching: This helps achieve a smooth user experience and adds a cushion for buffering content and withstanding delays.

  • Compression: Clients can compress the response to save some bandwidth by adding Accept-Encoding: gzip, deflate, br to the request.

Achieving Non-Functional Requirements

Non-Functional Requirements

Approaches


Scalability

  • Transmitting data through CDNs and GGC
  • Asynchronous back-end communication
  • Sharding databases for efficient data management
  • Versioning API to support new features


Availability

  • Regionally located edge servers at the ISP level
  • Using circuit breakers for fault tolerance
  • Replication to avoid SPOF


Flexibility/adaptability


  • Transcoding to different formats supported by different devices
  • ABR and buffering based on device capabilities
  • User playback history synchronization across devices


Security

  • Using TLS/1.3 for improved security
  • Authenticated and authorized access to private data
  • Implements OAuth/2 with PKCE code flow for third-party access



Low latency


  • Dividing video in segments/chunks for quick delivery
  • Prefetching segments using buffering at the client's end
  • Using caching techniques at the streaming server along with CDNs
  • Applying compression and encoding techniques
  • Uses ABR for adapting to dynamic network conditions

Latency budget#

This section calculates the response time for fetching audio and video clips from the YouTube API. We can calculate the response time as follows:

  1. Calculate the message size for the request and response.

  2. Calculate the response time based on the estimated message size.

As discussed in the Back-of-the-envelope Calculations for Latency chapter, in the case of GET, the average RTTRTT remains the same regardless of the data size (due to the small request size), and the time to download the response varies by 0.4 ms0.4\ ms per KB.

The manifest file#

The YouTube client requests the manifest file through GET before streaming content. Let's assume the request and response message sizes for the manifest file.

  • Request size: The GET request is assumed to be 1 KB since it will only contain a few parameters like video ID, user credentials (if required), client type, and so on.

  • Response size: We assume that the response size is roughly 35 KBs based on the data inside the manifest file. Usually, it includes adaptation sets, representations, durations, segment information, and so on, for the player to play the content.

Response size=35KB Response\ size= 35KB

Point to Ponder

Question

Can we also perform segmentation on the manifest file?

Hide Answer

Yes, it is also possible (although, not as often); to split the manifest file into multiple parts, we may have to add a small repeating piece of information in every chunk to identify the original video. These chunks can then be pushed together by the server, or clients can request them individually on demand.

Response time#

Considering the response size is 35 KBs, the following calculation will estimate the response time for obtaining a manifest file.

Response time calculator for the manifest file

Enter size in KBs35KB
Minimum latencyf204.5ms
Maximum latencyf285.5ms
Minimum response timef208.5ms
Maximum response timef289.5ms

Assuming the response size is 35 KBs, the latency is calculated by:

Timelatency_min=Timebase_min+RTTget+0.4×size of response (KBs)=120.5+70+0.4×35=204.5 msTime_{latency\_min} = Time_{base\_min} + RTT_{get} + 0.4 \times size\ of\ response\ (KBs) = 120.5 + 70 + 0.4 \times 35 = 204.5\ ms

Timelatency_max=Timebase_max+RTTget+0.4×size of response (KBs)=201.5+70+0.4×35=285.5 msTime_{latency\_max} = Time_{base\_max} + RTT_{get} + 0.4 \times size\ of\ response\ (KBs) = 201.5 + 70 + 0.4 \times 35 = 285.5\ ms

Similarly, the response time is calculated using the following equation:

TimeResponse=Timelatency+TimeprocessingTime_{Response} = Time_{latency}+ Time_{processing}

Now, for the minimum response time, we use minimum values of base time and processing time:

TimeResponse_min=Timelatency_min+Timeprocessing_min=204.5 ms+4 ms=208.5 msTime_{Response\_min} = Time_{latency\_min}+ Time_{processing\_min}= 204.5\ ms + 4\ ms = 208.5\ ms

Now, for the maximum response time, we use the maximum values of base time and processing time:

TimeResponse_max=Timelatency_max+Timeprocessing_max=285.5 ms+4 ms=289.5 msTime_{Response\_max} = Time_{latency\_max}+ Time_{processing\_max}= 285.5\ ms + 4\ ms = 289.5\ ms

Audio and video segments#

Audio and video segments are also retrieved using the GET method, and the player will receive multiple clips simultaneously due to the HTTP multiplexing feature. Let's calculate the average response time for one segment under the following heading.

Request and response size#

Let's assume that, on average, we receive a video segment of 1560.78 KB and an audio segment of 432.76 KB. We can use the request sizes from the previous examples to calculate the response time for the manifest file, since all of these are standard GET requests and their sizes don't vary much.

Response sizevideo=1560.78KB Response\ size _{video}= 1560.78KB
Response sizeaudio=432.76KB Response\ size _{audio} = 432.76KB

Response time#

Let's take a video segment as an example, since it is more extensive and affects the overall user-perceived latency. The following calculator gives the minimum and maximum responses for a video clip.

Response time calculator to obtain the video segment

Enter size in KBs1560.78KB
Minimum latencyf814.812ms
Maximum latencyf895.812ms
Minimum response timef818.812ms
Maximum response timef964.812ms

A summary of the overall latency budget for streaming video segments using our YouTube API is shown in the illustration below:

Response time for video segment of the YouTube API
Response time for video segment of the YouTube API

Optimization and tradeoffs#

Real-time transmissions, such as live broadcasts and short clips, where users can quickly swap videos, can be tricky to manage because they have a very low tolerance for latency. Here, we can achieve low latency using the following techniques:

  • Compromising video quality: We can send the lower resolution first, and, when we have buffered some playback time, we can send the high-resolution segments.

  • Reducing the segment length: We can send high-resolution fragments by reducing the segment length to receive small-sized, high-quality segments in real time. However, reducing the segment length can cause performance degradation in the compression algorithm. This is because large-sized segments can have a higher degree of redundancy and, therefore, result in better compression.

  • Prefetching the next segment: Instead of waiting for the current video to finish playing, we can prefetch the next video to be played in advance. However, this may not reduce the latency of the first video, but the latency of subsequent videos can be reduced.

We can implement all of the above techniques by taking a hybrid approach and making tradeoffs between video quality, fragment size, and buffer size to prefetch segments before playback.

Points to Ponder

Question 3

How can we keep the buffering time of the initial segments to a minimum?

Hide Answer

We can reduce buffering time by:

  • Pushing the initial audio or video segments along with the manifest file, without waiting for the player to request again.

  • Sending segments of low-resolution initially, and then improving the quality as per the client bandwidth.

  • Prefetching the initial segments by intelligently guessing the next video to be played.

Note: However, these tricks sometimes backfire, but most of the time, they work just fine. For example, if we send the initial segments in a low resolution, but the user sets the video quality to the highest available resolution, then the prefetched data would be wasted.

3 of 3

In this lesson, we discussed how our API meets the non-functional requirements described in the first lesson of this chapter. We also estimated the average response time for our API performance. Lastly, we went through some scenarios where our API could be adapted to handle near-real-time events. In the next lesson, we’ll exercise what we learned via a quiz on TikTok (a messaging platform that supports video streaming).

API Model for YouTube Service

Quiz on TikTok API Design